Past Event: Babuška Forum

From DAGs to Devices: Orchestrating kernels on modern computers

William Ruys, Postdoctoral Fellow, Oden Institute

10 – 11AM
Friday Sep 19, 2025

POB 6.304 and Zoom

Abstract

"Modern HPC platforms have evolved into complex heterogeneous ecosystems comprised of different computational resources (multi-core CPUs, GPUs, specialized accelerators) interconnected through complicated memory hierarchies and network topologies.
The programming challenge has shifted beyond simply writing performant and portable kernels. Developers must also orchestrate and schedule parts of their algorithms across the system to fully utilize its resources. Task-based parallel programming offers an elegant abstraction for this complexity. By representing computation as a directed graph of interdependent tasks, this approach delegates decisions in concurrency and memory management, data movement, task prioritization, and task-to-device mapping to a runtime system rather than burdening the programmer with these lower-level details.

The effectiveness of this approach hinges critically on policies for task-to-device mapping, the runtime’s ability to intelligently assign tasks to the appropriate processing units as execution unfolds. In a dynamic, online setting the mappings must adapt to varying task characteristics and changing memory pressure, balancing load with trade-offs for data redistribution costs.

This talk provides an overview. Starting from an introduction to task-parallel programming. I will cover core concepts, programming models, and naturally, extensive examples with a bias toward our own library for task parallelism. We then explore both classical and new approaches to the task mapping problem.

Online device assignment problem is framed as a sequential decision-making process under uncertainty. Starting with established heuristics and rule-based strategies, we examine where they fail and recent advances in machine learning driven approaches. We present our recent work on applying on-policy reinforcement learning to map communication dominated DAGs with dynamic input sizes and task durations. Focusing on DAGs arising from 2D stencil applications, we present our discrete-event runtime simulation for on-node heterogenous parallelism, it’s integration with TorchRL for reinforcement learning, and some current results and open challenges."

Biography

Dr. Ruys is a Postdoctoral Fellow at the Oden Institute for Computational Engineering and Sciences at The University of Texas at Austin, where he collaborates with George Biros, Mattan Erez, and Jaeyoung Park on developing reinforcement learning approaches for adaptable runtime systems. He earned his Ph.D. in Computational Science, Engineering, and Mathematics from UT Austin in 2024, focusing on distributed algorithms for nearest neighbor graph construction, Python-based solutions task parallelism, and image registration. His research bridges the critical gap between HPC system capabilities and application demands, with a focus on enhancing both programmability and performance in heterogeneous computing environments.

From DAGs to Devices: Orchestrating kernels on modern computers

Event information

Date

10 – 11AM
Friday Sep 19, 2025

Location POB 6.304 and Zoom

Hosted by Boyuan (John) Yao

Admin events@oden.utexas.edu